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Abstract 

: 

The Parameter-Less Self-Organizing Map (PLSOM) is a new neural 
network algorithm based on the Self-Organizing Map (SOM). It eliminates 
QQ ' the need for a learning rate and annealing schemes for learning rate and 

neighbourhood size. We discuss the relative performance of the PLSOM 
and the SOM and demonstrate some tasks in which the SOM fails but the 
PLSOM performs satisfactory. Finally we discuss some example applica- 
, tions of the PLSOM and present a proof of ordering under certain limited 

^ • conditions. 

o 



^ 1 Introduction 

> 

■ The SOM [Kohonen(1990),Ritter et al.(1992)Ritter, Martinetz, and Schulten] is 

^\ , an algorithm for mapping from one (usually high-dimensional) space to another 

(usually low-dimensional) space. The SOM learns the correct mapping indepen- 
dent of operator supervision or reward functions that are seen in many other 
lO ' neural network algorithms, e.g. backpropagation perceptron networks. Unfor- 

, tunately this unsupervised learning is dependent on two annealing schemes, one 

for the learning rate and one for the neighbourhood size. There is no firm theo- 
retical basis for determining the correct type and parameters for these annealing 
schemes, so they must often be determined empirically. The Generative Topo- 
graphic Mapping (GTM) [Bishop et al.(1997)Bishop, Svensen, and Williams, 
^ • Bishop et al.(1998)Bishop, Svensen, and Williams, Vellido et al.(2003)Vellido, 

5^ I El-Deredy, and Lisboa] is one attempt at addressing this. Furthermore, since 

these annealing schemes are time-dependent, they prevent the SOM from assim- 
ilating new information once the training is complete. While this is sometimes a 
desirable trait, it is not in tune with what we know of the adaptive capabilities 
of the organic sensomotor maps which inspired the SOM [Kaas(1991)]. There 
have been several attempts at providing a better scaling method for learning 
rate and/or neighbourhood size as well as taking some of the guesswork out of 
the parameter estimation. 
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QLD 4001 Australia (j.sitte@qut.edu.au). 



1 



1.1 Previous works 



One such attempt was done by Goppert and Rosenstiel [Goppert and Rosen- 
sticl(1996)], where the SOM is used to approximate a fimction, and the ap- 
proximation is used as a neighbourhood size decay parameter on a per-node 
basis. Unfortunately this is not appHcable to cases other than function ap- 
proximation and it requires knowledge of the desired approximation values 
of the function, thus losing the major advantage of the SOM; unsupervised 
learning. A more closely related approach would be the Plastic Self Organ- 
ising Map (PSOM) [Lang and Warwick(2002)], where the Euclidean distance 
from the input to the weight vector of the winning node is used to determine 
whether to add new nodes to the map. This is similar to the Growing Neu- 
ral Gas (GNG) algorithms [Fritzkc(1994), Fritzkc(1995)], but maintains plas- 
ticity. Another approach is the Self Organizing with Adaptive Neighbourhood 
Neural Network (SOAN) [Iglesias and Barro(1999)] which calculates the neigh- 
bourhood size in the input space instead of the output space like the SOM 
variants. The SOAN tracks the accumulated error of each node, and scales 
the neighbourhood function accordingly between a minimum and a maximum 
value, and like the GNG algorithms it can increase or decrease the number of 
nodes. This still leaves several parameters to be determined empirically by the 
user. The Time Adaptive Self-Organizing Map (TASOM) [Shah-Hosseini and 
Safabakhsh(2000), Shah-Hosscini and Safabakhsh(2003)] addresses the inability 
of the SOM to maintain plasticity by keeping track of dynamic learning rates 
and neighbourhood sizes for each individual node. The neighbourhood size is 
dependent on the average distance between the weight vector of the winning 
node c and its neighbours, while the learning rate is dependent only on the 
distance between the weight of a given node i and the input, similar to the 
standard SOM. The user is still required to select several training parameters 
without firm theoretical basis. The Auto-SOM [Haese(1999), Haese and Good- 
hill(2001)] uses Kalman filters to guide the weight vectors towards the centre 
of their respective Voronoi cells in input space. This automates computation of 
learning rates and neighbourhood sizes, but the user is still required to set the 
initial parameters of the Kalman filters. Unfortunately it is more computation- 
ally expensive than the SOM, and this problem increases with input size and 
number of inputs in the training set. The Auto-SOM also needs to keep track 
of all previous inputs, which makes continuous learning difficult and increases 
computational load, or compute the Voronoi set for each iteration, which would 
increase computational load and is only feasible if the input probability density 
distribution is known. Other recent developments in self-organisation include 
the Self-Organizing Learning Array [Starzyk et al.(2005)Starzyk, Zhu, and Liu] 
and Noisy Self- Organizing Neural Networks [Kwok and Smith(2004)]. 

1.2 Overview 

For these reasons we introduce the Parameter-Less Self-Organizing Map (PL- 
SOM). The fundamental difference between the PLSOM [Berglund and Sitte(2003)] 
and the SOM is that while the SOM depends on the learning rate and neighbour- 
hood size to decrease over time, e.g. as a function of the number of iterations of 
the learning algorithm, the PLSOM calculates these values based on the local 
quadratic fitting error of the map to the input space. This allows the map to 
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make large adjustments in response to unfamiliar inputs, i.e. inputs that are not 
well mapped, while not making large changes in response to inputs it is already 
well adjusted to. The fitting error is based on the normalised distance from 
the input to the weight vector of the winning node in input space. This value 
(referred to as e, the lowercase Greek letter epsilon, throughout this paper) is 
computed in any case, hence this mechanism can be implemented without in- 
ducing noteworthy increases in the computational load of the map or hindering 
parallelised implementations [Campbell et al.(2005)Campbell, Berglund, and 
Streit]. In Section [2] we give details of the PLSOM algorithm, in Section [3] we 
evaluate its performance relative to the SOM, in Section [4] we explain the ob- 
served behaviour of the PLSOM, in Section [5] we give examples of applications, 
a brief discussion of some aspects of the PLSOM relative to non-linear mapping 
problems in SectionlHland Section[7]is the conclusion. For mathematical proofs, 
see Appendix [XI 



2 Algorithm 

As an introduction and to give a background for the PLSOM we will here give a 
brief description of the SOM algorithm before we move on to the PLSOM itself. 

2.1 The ordinary SOM algorithm 

The SOM variant we will examine is the Gaussian-neighbourhood, Euclidean 
distance, rectangular topology SOM, given by Equations Q-®- The algorithm 
is, in brief, as follows: An input x{t) is presented to the network at time (or 
timestep, iteration) t. The 'winning node' c{t), i.e. the node with the weight 
vector that most closely matches the input at time t, is selected using Equation 

©• 

c{t) = Sirgmm{\\x{t) - w,{t)\\2) (1) 

i 

Wi{t) is the weight vector of node i at time t. ||.||2 denotes the L^-norm or 
n-dimensional Euclidian distance. (The SOM can use other distance measures, 
e.g. Manhattan distance.) The weights of all nodes are then updated using 
Equations ©-Q. 

w,it + 1) = w,{t) + Aw,{t) (2) 
Aw^it) ^ a{t)hc,,{t)[x{t) - w,{t)] (3) 

hcAt) = (4) 

hc,i{t) is referred to as the neighbourhood function, and is a scaling function 
centred on the winning node c decreasing in all directions from it. d[i, c) is the 
Euclidean distance from node i to the winning node c in the node grid. As is the 
case with the input /weight distance, the node distance can be calculated using 
some other distance measure than the Euclidean distance, e.g. the Manhattan 
distance or the link distance, and the grid need not be rectangular. a{t) is the 
learning rate at time i, j3{t) is the neighbourhood size at time t. 

Lastly the learning rate (a) and neighbourhood size (/3) are decreased in 
accordance with the annealing scheme. One possible annealing scheme is given 
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by Equations ([5]) and ^ for the decrease of the learning rate and the neigh- 
bourhood size, respectively - the important point is that the annealing scheme 
relies on the time step number t and not the actual fitness of the network. 

a{t + 1) = a{t)Sa, < < 1 (5) 
I3it + 1)= mSf}, 0<6p<l (6) 

Here Sp and Sa are scaling constants determined beforehand. 

These steps are repeated until some preset condition is met, usually after 
a given number of iterations or when some measurement of error reaches a 
certain level. The density of the nodes in input space are proportional to the 
density of input samples, however this may lead to undesired results, see Figure 
m Several variations of the algorithm outlined here exists, e.g. the Matlab 
implementation of the SOM uses a two-phased learning algorithm (an ordering 
phase and a tuning phase) and a step-based neighbourhood function. 



2.2 The PLSOM algorithm 

The fundamental idea of the PLSOM is that amplitude and extent of weight 
updates are not dependent on the iteration number, but on how well the PLSOM 
fits the input data. To determine how good the fit is, we calculate a scaling 
variable which is then used to scale the weight update. The scaling variable, e, 
is defined in Equations ([7]) and 

r{t) = max(||x(t) - w,{t)\\2,r{t - 1)), 
r(0) = ||a;(0)-w;/"^ii- ^""^ 



e(t) is best understood as the normalised Euclidean distance from the input 
vector at time t to the closest weight vector. If this variable is large, the network 
fits the input data poorly, and needs a large readjustment. Conversely, if e is 
small, the fit is likely to already be satisfactory for that input and no large 
update is necessary. 

The algorithm for the PLSOM uses a neighbourhood size determined by e, 
thus replacing the equation governing the annealing of the neighbourhood with 
/3{t) — constant Vt. (3 is scaled by e{t) in the manner of giving 8(e(i)), the 
scaling variable for the neighbourhood function. Equation (|12p . 

Q{e{t))^Pe{t), e(e(i))>0™„ (9) 

([!]) is not the only option for calculating 8, another example is (fTU| . 

Q{e{t)) = (/? - e„,Mt) + Ormn (10) 

A third alternative is (fTT|) . which is used in generating Figures [23(a)p3(d)| 



e(e(t)) = (/? - e,mn)lnil + e(t)(e - 1)) + (11) 

where ln() is the natural logarithm, e is the Euler number and 9min is some 
constant, usually for Equation (jlip or 1 for Equations (|9l)- (fT0| . Equation (fT2|) 
is the neighbourhood function. 

hcM = e^M^ (12) 
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Figure 1: Plot showing the effect of different e values on the neighbourhood 
function. 

As before, d{i, c) is a distance measure along the grid, i.e. in output space, from 
the winning node c to i which is the node we are currently updating. This gives 
a value that decreases the further we get from c, and the rate of decrease is 
determined by e, as can be seen in Figure [T] The weight update functions are 
Equations [13] and [TH 



As we can see from Equation p4p the learning rate a{t) is now completely 
eliminated, replaced by e{t). Thus the size of the update is not dependent on 
the iteration number. The only variable affecting the weight update which is 
carried over between iterations is the scaling variable r{t). Practical experiments 
indicate that r reaches it maximum value after the first few iterations, and does 
not change thereafter. 

3 Performance 

The PLSOM completely eliminates the selection of the learning rate, the an- 
nealing rate and annealing scheme of the learning rate and the neighbourhood 
size, which have been an inconvenience in applying SOMs. It also markedly 
decreases the number of iterations required to get a stable and ordered map. 
The PLSOM also covers a greater area of the input space, leaving a smaller gap 
along the edges. 

3.0.1 Comparison to the SOM variants 

We trained the Matlab SOM variant, the SOM and the PLSOM with identical 
input data, for the same number of iterations. The input data was pseudo- 
random, 2 dimensional and in the [0, 1] range. This was chosen because a good 
pseudo-random number generator was readily available, eliminating the need to 
store the training data. Since the training data is uniformly distributed in the 
input space the perfect distribution of weight vectors would be an evenly spaced 
grid, with a narrow margin along the edges of the input space. That way, each 
weight vector would map an evenly sized area of the input space. 

In comparing the two SOM implementations we used 3 separate quality 
measures, which are all based on the shape and size of the cells. A cell is the 



W,{t+1) = W^{t) + Aw^{t) 

Aw^{t) = e{t)h„it)[xit) - w,it)] 



(13) 
(14) 
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10 20 30 40 50 60 70 80 90 100 
iterations x1 000 

Figure 2: Graph of the decrease of uncovered space as training progresses for 
the PLSOM, the SOM and the Matlab SOM implementation. Note the quick 
expansion of the PLSOM and that it consistently covers a larger area than the 
SOM variants. 
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Figure 3: Graph of the average skew for the PLSOM, the SOM and the Mat- 
lab SOM implementation. For the first 24000 iterations the PLSOM is more 
ordered, before the SOM variants narrowly overtake it. 



area in the input space spanned by the weight vectors of four neighbouring 
nodes. 

Unused space 

We summarised the area covered by all the cells, and subtracted this 
from the total area of the input space. The resulting graph clearly shows 
how the PLSOM spans a large part of the input space after only a small 
number of iterations and maintains the lead throughout the simulation 
(Figure ^ . Please note that this quality measure will be misleading in 
situations where cells are overlapping, but this will typically only occur in 
the first few thousand iterations. 

Average skew 

For each cell we calculate the length of the two diagonals in a cell and 
divide the bigger by the smaller and subtract one, thus getting a number 
from to infinity, where represents a perfectly square cell. Again, we see 
that the PLSOM outperforms the SOM in the early stages of simulation 
but after ca. 24000 iterations the SOM surpass the PLSOM. After 100000 
iterations the difference is still small, however. See Figure IH 

Deviation of cell size 

We calculate the absolute mean deviation of the cell size and divide it by 



6 




10 20 30 40 50 60 70 80 90 100 
iterations x1 000 



Figure 4: Graph of the absolute mean deviation of cell size for the PLSOM, the 
SOM and the Matlab SOM. The PLSOM is more regular up until ca. iteration 
10000. 
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iterations x1 000 



Figure 5: Graph of the absolute mean deviation of cell size for the PLSOM, the 
SOM and the Matlab SOM, excluding the edge cells. Compare to FigureHl The 
PLSOM outperforms the Matlab SOM in both adaptation time and accuracy, 
and the SOM needs until ca. iteration 30000 to reach the same level of ordering. 

the average cell size to get an idea of how much the cells differ in relative 
size. Here the SOM is superior to the PLSOM after ca. 10000 iterations, 
mainly because of the flattened edge cells of the PLSOM, see Figure HI 
If we ignore the cells along the edge, the picture is quite different: the 
PLSOM outperforms the SOM with a narrow margin, see Figure [S] 



3.0.2 Plasticity preservation 

The illustrations in this section show the positions of the weight vectors, con- 
nected with lines, in the input space. When a SOM has been trained, it will 
not adapt well to new data outside the range of the training data, even if a 
small residual learning rate is left. This is illustrated by Figure El where a SOM 
has been presented with pseudo-random, uniformly distributed 2-dimensional 
data vectors in the [0, 0.5] range for 50000 iterations. Thereafter the SOM was 
presented with 20000 pseudo-random, uniformly distributed 2-dimensional data 
vectors in the [0, 1] range, after which the SOM has adapted very little to the 
new data. In addition the adaptation is uneven, creating huge differences in 
cell size and distorting the space spanned by the weight vectors. If we subject a 
PLSOM to the same changes in input range, the difference is quite dramatic; it 
adapts correctly to the new input range almost immediately, as seen in Figure 
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Figure 6: SOM first trained with inputs ranging from to 0.5 for 50000 iterations 
shown after 20000 further training iterations with inputs ranging from to 1.0. 
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Figure 7: PLSOM first trained with inputs ranging from to 0.5 for 50000 
iterations shown after 20000 further training iterations with inputs ranging from 
to 1.0. Note the difference between this and Figure [6l 

m 

3.0.3 Memory 

In the opposite case, viz. the SOM is presented with a sequence of inputs 
that are all restricted to a small area of the training input space, it would be 
preferable if the SOM maintains its original weight vector space, in order to not 
'forget' already learned data. Figure[H]demonstrates what happens to a PLSOM 
if it is trained with pseudo-random, uniformly distributed 2-dimensional data 
in the [0, 1] range for 50000 iterations and then presented with inputs confined 
to the [0, 0.5] range for 20000 iterations. This leads to an increase of the density 
of weight vectors in the new input space, yet maintains coverage of the entire 
initial input space, resulting in distortions along the edge of the new input space. 
Both these effects are most pronounced in the PLSOM. 

3.1 Drawbacks 

The PLSOM is measurably less ordered than a properly tuned SOM and the edge 
shrinking is also more marked in the PLSOM. The PLSOM does not converge 
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Figure 8: PLSOM first trained with inputs ranging from to 1 for 50000 iter- 
ations shown after 20000 further training iterations with inputs ranging from 
to 0.5. Note that while the weights have a higher density in the new input 
space, the same area as before is still covered, i.e. none of the old input space 
has been left uncovered. 

in the same manner as the SOM (there is always a small amount of movement), 
although this can be circumvented by not performing new weight updates after 
a satisfactory fit has been established. 

4 Analysis 

This section highlights a special case where the SOM fails but the PLSOM 
succeeds, and explores the causes of this. 

4.1 Experiments 

We have applied the PLSOM and two variants of the SOM to the same problem; 
mapping a non-uniformly distributed input space. As input space we used a 
normal distributed pseudo-random function with a mean of 0.5 and standard 
deviation of 0.2. Values below or above 1 were discarded. The same random 
seed was used for all experiments and for initialising weights. A SOM variant 
that uses the same neighbourhood function as the PLSOM and an exponential 
annealing scheme for learning rate and neighbourhood size, here nominated 
'plain SOM', was used for comparison. As can be seen from Figure |9] the SOM 
is severely twisted when we try a 20-by-20 node rectangular grid. The size of 
the ordinary SOM algorithm must be reduced to 7-by-7 before all traces of this 
twisting are removed. Altering the annealing time does not solve the problem. 
The PLSOM on the other hand performs well with the initial size of 20-by-20 
nodes, filling the input space to over 77%, see Figure fTUl 

4.2 Explanation 

This phenomenon can be explained by looking at the likelihood of a given input 
in relation to the size of the weight update this input will result in, i.e. the 
expected update given the input distribution. The likelihood of an input occur- 
ring is governed by the Gaussian probability density function. The likelihood p 
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Figure 9: Ordinary SOM after 100000 iterations of normally distributed input 
with mean 0.5, standard deviation 0.2, clipped to the [0, 1] interval. Note that 
two nodes which are close in input space may not be close on the map. 



Figure 10: PLSOM after 100000 iterations of normally distributed input with 
mean 0.5, standard deviation 0.2, clipped to the [0, 1] interval. While the corre- 
spondence between weight vector density and input density is weaker than for 
the SOM, the topology is preserved. Compare to Figure [H See also Figure [T5l 
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Figure 11: Update size x likelihood for a corner node w of a 20x20 node ordinary 
SOM algorithm. The position of v in the input space is marked by a vertical 
white line. The position of v in the map is (1,1). 



of an input occurring in the interval < zi, Z2 >, where zi < Z2, is approximated 
using the error function, erf: 

p(zi,Z2) = ^(erf(^)-erf(ii|)) (15) 

where s is a scaling constant to account for the standard deviation. An 
analysis of the expected update of a given node is given by Equation (|16p : 

^iA'w,x) = Aw{x)p{x) (16) 

Where ^(Aw,x) is the expected displacement of weight vector w given x 
as input, Aw{x) is the displacement of w given x and p{x) is the probability 
density of this input. 

By discretising this over a 2-dimensional n-by-n grid, N, we can plot an 
approximation of the expected displacement for each square of TV, as seen in 
Figures [nJini 

When comparing the expected update of a Matlab SOM algorithm and a 
PLSOM we see that the PLSOM edge nodes receives a far larger amount of its 
update from outside the area covered by the map than its Matlab counterpart, 
thus making sure that the expansion outwards is even and less jerky. 

To get a clearer picture, we need to integrate the expected displacement over 
the entire input space fl which contains all possible inputs x, giving Equation 

C(Aw) = ^ [ Aw{x)p{x)dx (17) 

Discretising the integrated expected displacement gives a vector for each 
node in the map, indicating how much and in which direction it is likely to be 
updated given the input distribution, as shown in Figures [T51 and [T^ 

As we can see from Figure [T31 this vector is greater for the corner node 
than for the side node in the SOM algorithm, while the opposite is true for the 
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Figure 12: Update size x likelihood for a corner node u of a 20x20 node PLSOM. 
The position of v in the input space is marked by a vertical white line. The 
position of v in the map is (1,1). 



Figure 13: The expected displacement vectors for the edge nodes along one edge 
of an ordinary SOM. Note that the vectors are changing direction abruptly from 
node to node, causing the warping. 



PLSOM, as seen in FigurefHl This leads the corner nodes in the SOM algorithm 
to expand outwards faster than the side nodes, thus creating the warping. In 
the PLSOM, the side nodes expand outward faster, creating an initial 'rounded' 
distribution of the weights, but subsequent inputs pull the corners out. Also note 
that the edge nodes of the PLSOM is only marginally pulled inwards by inputs 
inside the weight grid, since the amount of update depends on the distance from 
the input to the closest node, not only on the distance from the node in question 
- this contributes to the quicker, more even expansion. 

Finally, the weight update functions of the different algorithms give us the 
last piece of the explanation. Consider a map that receives an input far outside 
the area it is currently mapping, after already being partly through its annealing, 
and therefore partially ordered. 

When this happens to an ordinary SOM the update will be large for the 
winning node, but because the size of the neighbourhood function is so small 




Figure 14: The expected displacement vectors for the edge nodes along one edge 
of a PLSOM. 
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distance from centre 



Figure 15: Weight density vs. distance from centre for the SOM and the PL- 
SOM. The 2-diniensional input was normal distributed with a mean and 0.2 
standard deviation. Observe that while the PLSOM has less correlation be- 
tween input density and weight density, it has far less variance and covers a 
larger area. See also Figures and [TOl 

the neighbours of the winning node will receive only a very small update. If the 
same situation occurs to a PLSOM, the neighbourhood size will scale (since it 
is dependent on the distance from the input to the winning node) to include 
a larger update for the neighbours of the winning node, thus distributing the 
update along a larger number of the edge nodes. 

It should be pointed out that mapping a large portion of the input space 
while preserving such a skewed distribution is not possible - the difference be- 
tween the length along the edges and the length through the centre is too great 
to preserve neighbourhood relations. When faced with this type of high- variance 
input distribution, one is faced with the choice of which property to sacrifice; 
neighbourhood consistency or density equivalence. The SOM tries to do both, 
and fails. GNG, PSOM and similar algorithms do both at the cost of ending up 
with arbitrary network connections. The PLSOM is unique in preserving neigh- 
bourhood relations for a pre-defined network. This comes at a cost of poorer 
correspondence between input density and weight density, as can be seen from 
Figure \TE\ 

5 Applications 

The PLSOM has been applied to three familiar problems by the authors. These 
applications will only be explored briefly here, in the interest of not distracting 
from the main subject of this article. 

5.1 Sound source localisation through active audition 

This application deals with processing a stereo sound signal, presenting it to a 
PLSOM or SOM to determine the direction of the sound source and orienting 
the microphones towards the sound source. This application illustrates that the 
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PLSOM can deal with cases where the number of input dimensions (512) is far 
higher than the number of output dimensions (2). 

5.1.1 Egirlier works 

The physics of binaural audition is discussed in [King and Carlile(1995),Nandy 
and Bcn-Aric(1995),Hartmann(1999)] and reinforcement learning in [Sutton(1992)]. 
Early works include [Huang et al.(1995)Huang, Ohnishi, and Sugie, Huang et al.(1997)Huang, 
Ohnishi, and Sugie], but it is important do distinguish between passively deter- 
mining the direction of sound sources and active audition. Active audition aims 
to using the movement of the listening platform to pinpoint the location, in the 
same way biological systems do. Active audition is therefore an intrinsically 
robotic problem. Other active audition works can be grouped into subcate- 
gories: First we have applications that rely on more than two microphones to 
calculate the source of a sound, e.g. [Rabinkin et al.(1996)Rabinkin, Renomeron, 
Dahl, French, Flanagan, and Bianchi]. While these are certainly effective, we 
observe that in nature two sound receivers are sufficient. Since microphones 
consume power and are a possible point of failure, we see definite advantages 
to being as frugal as nature in this respect. Secondly we have methods relying 
on synergy of visual and auditory cues for direction detection, most notably by 
members of the SIG Humanoid group [Nakatani et al.(1994)Nakatani, Okuno, 
and Kawabata,H. Kitano and H. G. Okuno and K. Nakadai and T. Matsui and 
K. Hidai and T. Lourens(2002), Nakadai et al.(2000)Nakadai, Lourens, Okuno, 
and Kitano, Lourens et al.(2000)Lourens, Nakadai, Okuno, and Kitano, Nakadai 
et al.(2002)Nakadai, Okuno, and Kitano, Nakadai et al.(2003)Nakadai, Okuno, 
and Kitano] . Some of these also include neural networks and even SOMs, such as 
[Rucci et al.(1999)Rucci, Edelman, and Wray,Nakashima et al.(2002)Nakashima, 
Mukai, and Ohnishi] . However, it is known that even humans that are born blind 
can accurately determine the direction of sounds [Zwiers et al.(2001)Zwiers, Os- 
tal, and Cruysberg], so interaction between vision and hearing cannot be a cru- 
cial component of learning direction detection. Our implementation is unique 
in that it does not rely on visual cues, specialised hardware nor any predefined 
acoustic model. It learns both the direction detection and the correct motor ac- 
tion through unsupervised learning and interaction with its environment. It also 
incorporates a larger number of measures than other methods, which is made 
possible through the SOM/PLSOM ability to find patterns in high- dimensional 
data. 

5.1.2 System description 

The aim is to let the process be completely self-calibrating - all that is needed 
is to provide a set of sound sources, and the algorithm will figure out on its 
own where the sound is coming from and how to orient towards it. This is done 
using a pipelined approach: 

1. Digital stereo samples are streamed from a pair of microphones. 

2. For each sampling window, which is 512 samples long, we compute the Fast 
Fourier Transform (FFT) of the signal. This is averaged to 64 subbands. 

3. For each sampling window we compute Interaural Time Difference (ITD), 
Interaural Level Difference (ILD), Interaural Phase Difference (IPD) and 
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Figure 16: Active audition system layout 



Relative Interaural Level Difference (RILD). ILD, IPD and RILD are 
based on the FFT, so that we have one value for each of the 64 subbands. 

4. The resulting 256-elenient vector is presented to a PLSOM. 

5. The position of the winning node is used as index into a weight matrix 
which selects the appropriate motor action. 

6. If the sound volume is over a given threshold, the selected motor action is 
carried out. 

7. After a short delay, the algorithm checks whether the winning node has 
moved closer to the centre of the map, and uses this to calculate a reward 
value for the reinforcement learning module. 

Figure [TCI illustrates this procedure. The pipelined approach has the advantage 
of making experimenting with different processing paths much simpler. It also 
lends the approach to parallelising hardware and software implementations. In 
order to train the PLSOM a number of samples are recorded. We used white 
noise samples from 38 locations in front of the robot. The samples were recorded 
at 50 and 300 cm distance from the source, with 10° horizontal spacing. The 
training algorithm then presents a few seconds of random samples to the system 
for 10000 training steps. Each sample is 256-dimensional, and a new sample is 
presented every 32 ms. This sensitises each node to sound from one direction 
and distance. The latter part of the training is done online, with the robot 
responding to actual sound from a stationary speaker in front of it. Initially the 
robot head is pointed in a random direction, and the training progresses until 
the robot keeps its head steadily pointed towards the speaker, at which time a 
new random direction is picked and the training continues. 

5.1.3 Results 

Our approach described above consistently manages an accuracy of around 5° , 
which is comparable to human acuity. The precision was determined by keeping 
the robot stationary and registering the winning node of the PLSOM. We then 
moved the sound source horizontally until the winning node stabilised on one 
of the immediate neighbours of the initial winning cell, noting how much the 
source had to be moved. The relative accuracy of the method is demonstrated 
in Figure [T71 The graphs were generated using a set of recordings of white noise 
at a distance of 1 metre. The PLSOM is, as we can see, almost free of deviation. 
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Figure 17: High dimensional data: Average winning node vs. actual angle using 
the PLSOM method. The grey area is one standard deviation. 

This enables one to estimate the direction using a small number of samples, i.e. 
quickly. 

5.2 Inverse Kinematics 

Inverse Kinematics (IK) is the problem of determining the joint parameters of 
a robotic limb for some given position. This problem is interesting in evaluat- 
ing the PLSOM because it involves mapping between two spaces with wildly 
different topologies, a half-torus shaped space in Euclidean space and a roughly 
wedge-shaped space in joint parameter space. 

5.2.1 Existing methods 

Robot control depends on IK but there are several problems associated with the 
existing methods. The Jacobian pseudo-inverse lets one solve the problem com- 
pletely, but is computationally expensive, relatively complicated and unstable 
aroimd singularities. The Jacobian transpose is faster and simpler, but not par- 
ticularly accurate and does not move in a shortest path like the pseudo-inverse. 
Other methods, like Cyclic Coordinate Descent [Wang and Chen(1991)], which 
solve the unstable singularity problem has been proposed. There have also been 
solutions of the IK problem using SOMs [Ritter et al.(1992)Ritter, Martinetz, 
and Schulten], which lets a SOM learn the joint angles for a number of points 
in space, and an approximation of the inverse Jacobian in the vicinity of each 
point. This gives good results with relatively few nodes but it relies on infor- 
mation being stored in the nodes of the SOM, rather than holistically in the 
network, leading each node to be more complicated than necessary. It is not 
clear whether using an approximated inverse Jacobian can lead to the same sort 
of instability around singularities as with the pseudo-inverse Jacobian. Even 
so, our method borrows heavily from the SOM approach and should be seen in 
relation to it. 

5.2.2 Proposed solution 

We opted for a slightly different and, we believe, novel approach using the 
PLSOM. Each node maps a point in 3D space to a point in joint parameter 
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space. The map is trained through generating a random joint configuration 
and updating the PLSOM weights with it. After the training is complete each 
node is labeUed with the manipulator position that will result from applying 
the weight vector to the joint parameters. This allows the manipulator to be 
positioned in accordance with the following simple algorithm: 

1. Select the node closest to the desired point in space by comparing node 
labels. This step can be greatly accelerated by starting the search at the 
last node used. 

2. Select the neighbours of the node so that we can create three almost 
orthogonal vectors in 3D space. This is trivial given the lattice structure 
of the PLSOM. 

3. The 3 vectors in 3D space are then orthogonalised using the Gram-Schmidt 
algorithm. 

4. The analogous steps are carried out on the 3 corresponding vectors in joint 
parameter space. 

5. The resulting vectors can now be used to interpolate the joint parameters. 
5.2.3 Experiments and Results 

The PLSOM solves the IK problem very quickly, as no iteration is necessary. 
One can input the desired target position and immediately get an approximation 
of the joint parameters that achieve this target. However, the level of precision 
depends on the number of nodes in the network. As a demonstration of the 
capabilities of the PLSOM method a 6 DOF robotic arm was programmed to 
play chess against itself. A PLSOM with 3600 nodes was trained in a half- 
torus shaped area in front of the robot covering a small chessboard and the 
surrounding table, 30000 training iterations are completed in less than 5 minutes 
on a low-end desktop PC. Even with the relatively few nodes the error is well 
below the mechanical error in the robotic arm. The robot is able to quickly 
and accurately pick and place the chess pieces, as shown in the short video clip 
in [Berglund(2004)]. 

In order to assess the different IK methods, we performed a simulation 
wherein a 3 DOF robot arm is moved from one position to another. Each 
method is allowed 500 iterations to complete this and for each iteration the 
error is calculated. We performed this test with a PLSOM with 3600 nodes, a 
PLSOM with 36400 nodes and a PLSOM with 230400 nodes. We then repeated 
the experiment for 4 different target positions and averaged the error. The re- 
sult is displayed in Figure UHl As we can see from the graph, the PLSOM's 
accuracy is related to the number of nodes. It should be noted that train- 
ing time is roughly 0{n) in number of nodes n, so increasing accuracy is not 
computationally expensive. Accuracy is slightly better than what is reported 
in [Ritter et al.(1992)Ritter, Martinetz, and Schulten], i.e. 0.02% error com- 
pared to 0.06% error, although this is not surprising given the large difference 
in number of nodes. The execution speed was measured and averaged over all 
the experiments - the difference in execution speed for the 3600 node network 
and the 230400 node network is typically less than 2%, e.g. 29.52/is and 29.98/is 
on a low-end desktop pc. 
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Figure 18: Error after 500 iterations of some different sized PLSOMs. 



5.3 Classification of the ISOLET data set. 



In order to test whether the PLSOM can handle very high-dimensional and 
clustered data, we selected the ISOLET [Cole et al.(1990)Cole, Muthusamy, 
and Fanty] data set for analysis. The data set contains suitably processed 
recordings of 150 speakers saying the name for each of the letters in the alphabet, 
twice. The set is subdivided into the training set, containing 120 speakers, 
and the test set with the remaining 30 speakers. Algorithms seeking to solve 
this problem are trained on the training set and evaluated on the test set. 
The data is present as 617-dimensional real- valued vectors with elements in the 
[—1, 1] range representing various preprocessed properties of the sound, see [Cole 
et al.(1990)Cole, Muthusamy, and Fanty] for details. The best result reported 
without any further signal processing is 95.83% [Dietterich and Bakiri(1991)] 
and 95.9% [Fanty and Cole(1990)], achieved with backpropagation. k-Nearest 
Neighbour [Fix and Hodges(1951)] methods achieve from 88.58% (k = 1) to 
92.05% (k = 5, ambiguities resolved by decreasing k) but are slow due to the 
size of the training set which forces the computation of 6238 617-dimensional 
Euclidean distances for each classification. Classification of clustered data is 
troublesome with the PLSOM because it tries to approximate a low-dimensional 
manifold in the input space, in this case there may not be a manifold. Since the 
PLSOM is unsupervised there is no way to direct learning effort to one particular 
property of the input, the PLSOM tries to map them all. In the ISOLET set 
the letter is not the only property encoded in the data, there arc all sorts of 
possible data that may or may not be present such as speaker, the speakers' 
age, gender, dialect, smoker/non-smoker and so on. Ideally the PLSOM would 
have one output dimension for each dimension of the embedded manifold, but 
this is impractical in this case. We therefore settled on a 3-dimensional PLSOM 
with 20x20x20 nodes. The PLSOM was trained with random samples from the 
training set for 100000 iterations with neighbourhood size = 2, then each node 
is labelled with the input it responds to. If a node responds to more than 1 
input, a vote is performed and the node is labelled with the input it responds 
to most frequently. If no input gets thrice as many votes as other inputs, or if 
the node does not respond to any input, it is removed. The map is then used to 
classify the test set. This typically results in ca. 2800 remaining nodes, which 
we utilise as the reference vectors for k-NN classification. Thus the PLSOM can 
be seen as a way of speeding up the k-NN algorithm by reducing the number of 
reference vectors. This does however come at a price of accuracy - it achieves 
90.31% accuracy at k=5. 
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(a) SOM. 



(b) PLSOM. 



Figure 19: The 3-beacon navigation mapping by the SOM and the PLSOM in 
the unit square. Both maps have a neighbourhood size of 17. Beacons were 
positioned at (0.3, -0.3), (1.3, 0.5) and (-0.5,0.8). The origin is in the upper left- 
hand corner. 



6 Discussion 

As indicated above, the PLSOM emphasises maintaining topology over mod- 
elling the input density distribution. In many cases this may be a disadvantage, 
for example with highly clustered data. In other cases, however, this is exactly 
what is needed for a useful mapping of the input space. This happens when 
the input space is nonlinearly mapped to the space from which the inputs are 
selected. This is best illustrated by an example (based on [Keeratipranon and 
Maire(2005)]): A robot navigates a square area by measuring the angles be- 
tween three uniquely identified beacons, giving a 3-dimensional vector with a 
one-to-one relationship to the location of the robot in space. Unfortunately the 
density and topology of the embedded manifold is quite different from the input 
space, causing a large number of samples to fall in a few small areas of the 
3-dimensional input space where the curvature of the embedded Riemannian 
manifold is high. This causes the SOM to try and fit a corner of the map where 



there should be no corner, see Figure 19(a) The density of the input space 



is correctly represented, but the topology of the embedded manifold (which is 
the interesting property in this case) is not. This can of course be corrected 
by selecting the samples carefully in the case of our example since we know the 
bidirectional mapping, but in a real application this would not be known. It 
would therefore be beneficial to find an approach that largely ignores the distri- 
bution of the input space and instead emphasises the topology of the embedded 



manifold. The PLSOM does this to a higher degree, see Figure 19(b) 



7 Conclusion 

We have addressed several problems with the popular SOM algorithm and pro- 
posed an algorithm that solves these; the PLSOM which provides a simplifi- 
cation of the overall application process, since it eliminates the problems of 
finding a suitable learning rate and annealing schemes. The PLSOM also re- 
duces the training time. Flexibility and ordering of the map is facilitated and 
we have shown that the PLSOM can be successfully applied to a wide range of 
problems. Furthermore, the PLSOM is able to handle input probability distri- 
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butions that lead to failure of the SOM, albeit this comes at the cost of lower 
correspondence between the input distribution and the weight density. While 
the PLSOM does not converge in the sense that a SOM will if the learning rate 
is allowed to reach 0, the same effect can be achieved by simply not performing 
weight updates after a number of inputs. All this is achieved without inducing 
a significant computation time increase or memory overhead. Finally we have 
shown (see appendix \Q that the PLSOM is guaranteed to achieve ordering 
under certain conditions. 

A Proof of guaranteed ordering of a PLSOM 
with 3 nodes and 1-dimensional input and 
output. 

This section will present a proof of guaranteed ordering in a special case of the 
PLSOM. We start out by establishing some lemmas necessary for the proof, 
then we examine the proof and finally we speculate implications of the proof. 
For all these proofs we assume that: 

e<l (18) 

Where e is the normalised distance from the input to the weight of the winning 
(closest) node. 

K.^ < 1 (19) 

Where /ic,i is a neighbourhood function which depends on e and the distance 
in output space between the winning node c and the current node i. Please see 
Section [2] for a discussion of the neighbourhood function, c — i ^ hd — 1 and 
hc^i is monotonously decreasing with increasing distance from c. 

hc,n < hc.n+l (20) 



Which implies that Wn+i is closer to the input than Wm and will hence receive 
a larger scaling from the neighbourhood function. In the following an ordered 
map will denote a map where all the nodes are monotonously increasing or 
decreasing, which means either [21] or [22] is true. 

Wn < u;„+i, Vn (21) 

Wn > w„+i, Vn (22) 
Any other map will be called unordered. 

Lemma 1: The weights of a node cannot overshoot the input, i.e. the weights 
of a node cannot move from one side of an input to the other as a result of that 
input. 

Proof: Assum£0 that there is an input x, and a node i with weight Wi. The 
amount of update to Wn, Awi is equal to ehc^i[x — Wi] as before. Since e < 1 
and hcA < 1, it is clear that |A?«i| < \x — Wi\. Also, lAwij — \x — Wi\ is only 

-"^Note that for this section we disregard the (t) part of the notation, as it is implied. 
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true where e = 1 A /ic,i = 1, which can only hold for the winning node c. This 
proof also applies to the standard SOM where a{t) < 1. 

Lemma 2: There exists no input that can turn an ordered 1 input dimen- 
sion and 1 output dimension map into an unordered one. 

Proof: Proof by contradiction. Assume that Wn and Wn+i are weights of an 
ordered ID map, and w„ < We will prove that no input x > Wn+i 

can move Wn past Wn+i (It is easy to see that the converse has to be true for 
X < Wn)- For w„ to move past Wn+i and making the map unordered, the update 
of node n must be greater than or equal to the distance between node n and 
node n + 1, plus the update to node n + 1, so ([23]) has to be true. 

AWn > Wn+1 -Wn+ Aw„+i (23) 

Remember that Aw„ = ehc.n{x — Wn) and that x > Wn+i > Wn, which gives us 
(HI 

ehc,nix - Wn+l) + ehc,n{Wn+l " Wn) 
> Wn+l - Wn + ehc,n+lix - Wn+l) 

It is clear that, because of the premises, Equation cannot be true. This is a 
restatement of the proof that the ordered states are absorbing sets, see [CottrcU 
et al.(1998)Cottrell, Fort, and Pages] 

That leaves the cases where Wn < x < Wn+i- Since, according to lemma 1, 
only one node (the winning node) can reach x and no node can overshoot x, it 
follows that the nodes must be on the same side of x after the weight update as 
before. Therefore, they cannot become unordered. 

Note that the lemma 1 and 2 holds for PLSOMs with any number of nodes, 
as long as there is one input and one output dimension. This proof is very 
similar to the one given for the SOM by Kohonen [Kohonen(1989)]. 

Lemma 3: In the special case where Wi = a,Vz and some value a, any in- 
put other than a will result in an ordered map. 

Proof: Since all nodes have the same distance to the input, the winning node 
will automatically be the first one, wq. Again, since all nodes are the same dis- 
tance from the input, the amount each node is updated is determined solely on 
the lattice distance from the winning node. Therefore, wq will move the most, 
wi a little less and W2 even less and so on - resulting in an ordered map. 

Lemma 4: A 1-dimensional PLSOM with 3 nodes will always reach an or- 
dered state given a sufficiently large number of uniformly distributed inputs. 
Proof: The proof is computer-assisted, here we give an outline of the procedure 
for calculating it. 
In short the proof is as follows: 

1. Calculate a scalar field in expressing how much closer the weights are to 
an attractor point in ordered space after an update, where positive values 
indicate that the weights have moved away from the attractor. 

2. Calculate the gradient of the scalar field. 

3. Calculate the upper bound of the gradient. 
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Figure 20: The unordered subspace U. All other unordered states are mirrors 
or inversions of states in this subspace. 

4. Given the upper bound of the gradient we calculate the size of a sample 
grid. 

5. Given the upper bound of the gradient and the expected update at each 
sample point, the expected update must point towards the attractor in the 
vicinity of the sample point. If this holds for all sample points, it holds 
for the whole subspace of unordered weights. 

The weights of the three PLSOM nodes are denoted wo,'Wi,'W2- The condi- 
tions under which the proof is calculated are as follows: 

• Uniformly distributed input in the range [0, 1]. 

• Wq < W2 < Wi 

• Wq < Wi 

• < Wq < I 

• < ■u;2 < 1 

• < wi < 1 

• Linear neighbourhood function, for simplicity. 

The weights wq, wi and 11:2 can be seen as coordinates in a 3-dimensional space, 
where all possible configurations fill the unit cube. The subspace spanned by 
the constraints above are denoted U, see Figure [20l This subspace represents 
the only way in which a ID 3-node map can be unordered - all other unordered 
states are inversions or mirrors of this state, see Figure 1211 An ordered map 
fulfils one of the following configurations: 

1. Wq < Wi < W2 

2. Wq > Wi > W2 
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Figure 21: All unordered states in the volume of all possible states. 
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The ordered subspace fills the volume drawn in Figure [511 Now on to the proof 
proper: 

We introduce the concept of the expected update vector: Given a weight config- 
uration and an input probability density function we can compute the distance 
and direction that a node is most likely to move in and is given in ()25|) . 



u= [uo,ui,U2] (25) 
Where u is the expected update vector, its elements given by ([26]). 



"2 + "'! 



-'^^ fn{2)dx (26) 



+(1 - ^^) UDdx 



Where /n(c) is the expected update of node n given c as the winning node. 
As mentioned above, we use a simplified version of /^(c) to facilitate integration, 
see ([?7|) . 

\x ~ Wr\ f |c — n| \ , , , , 

fn{c)=^-^r^[l-^-^j{x-w,,) (27) 

Where r is the normalising variable and f3 is the neighbourhood size. For 
simplicity we set r ~ 1 and (3 = 2. x is the input, uniformly distributed in the 
[0, 1] interval. 

As mentioned above, the three input weights of the nodes, wo,wi and W2 
can be seen as coordinates in a 3-dimensional Euclidean space, of which [/ is a 
subspace. 

Every point w in this subspace is associated with an expected update u: The 
position before an input is represented by w, and the most likely position after 
a uniformly distributed random input is w + u. 

Now we introduce a point denoted t in the ordered subspace of the unit cube, 
which is the attractor in the dynamic system in U. In other words, all the ex- 
pected weight updates in U will bring the weight vectors closer to the attractor, 
and hence closer to the ordered subspace, see ([28)) . 



\\w + u-i\\i-\\w-i]\i<0,weU (28) 

where ||.||i is the i^'^-norm or Manhattan distance. The L^-norm was chosen 
because it produces a simpler expression than the L^-norm. t has been found 
empirically to be close to [^^^, jq], the exact location is not important to 
this proof. Equation ([^S]) defines a 3-dimensional scalar field and in order to 
prove negativity we compute the upper bound of the length of the gradient, 
16.5. With this estimated upper bound we must check that no point is further 
away from a sample point than 1.333 * 10^*, and that no sample point has a 
value greater than d = —2.2 * 10^"^. This gives Equation 




(29) 

Where s is the spacing of the 3-dimensional grid of sample points, 1.53959 * 
10""*. This equals roughly 4.57* 10^° sample points to check, another reason for 
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(a) Initial state. 



(b) After 400 inputs. 
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(c) After 480 inputs. (d) After 650 inputs. 

Figure 23: Evolution of the weight positions of a 64-node 2D PLSOM initiaUsed 
to a difficult position. Neighbourhood size is 11, minimum neighbourhood size 
is 0. To simulate what will happen if this configuration appears late in training, 
we force an r value of 0.65. 



choosing the simpler L^-norm. The necessary calculations are easily performed 
by a low-end desktop computer in less than 12 hours. Since the distance from 
the weight position to the attractor is steadily diminishing, it follows that the 
weight position will, given enough consecutive inputs, come close enough to the 
attractor to reach ordered space. 

Whether this proof is extensible to networks with more than 3 nodes and 
more than 1 dimensional input is at this point uncertain, but the image sequence 
in (Figures 23(a)p3(d) ) certainly suggests the possibility. 

Kohonen [Kohonen(1989)] mentions a proof (see [Cottrell et al.(1998)Cottrell, 
Fort, and Pages, Cottrell and Fort(1987)]) of ordering of a simplified SOM based 
on the probability of an ordering input happening and an infinite number of 
inputs. This proof in essence relies on the fact that if there is a sequence of inputs 
such that the map will become ordered and one generates a sufficiently large 
number of inputs the probability of encountering the ordering sequence of inputs 
approach 1. The proof just presented here establishes that for any configuration, 
the expected update is in the direction of ordering for any single input. It also 
shows the existence of an ordered attractor for the dynamical system without 
having to satisfy the Robbins-Monro [Robbins and Muonro(1951)] condition. 

Conjecture: Any 1-dimensional PLSOM where only the immediate neigh- 
bours of the winning node are updated can be seen as a chain of 3-node net- 
works, where each subnetwork is guaranteed to become ordered, therefore the 
whole network will become ordered. This is similar to the proof given in [Ko- 
honen(1989)] for the SOM, albeit the authors are not confident enough that it 
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also applies to the PLSOM to posit it as more than a conjecture. 
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